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1 ABSTRACT 

2 The suggested bat-origin for Middle East respiratory syndrome coronavirus (MERS-CoV) has 

3 revitalized the studies on other bat-derived coronaviruses for the interspecies transmission 

4 potential. Bat coronavirus (BatCoV) HKU9 is an important betacoronavirus (betaCoV) that is 

5 phylogenetically affiliated with the same genus as MERS-CoV. The bat-surveillance data 

6 indicated that BatCoV HKU9 has been widely spreading and circulating in bats. This highlights 

7 the necessity of characterizing the virus for its potential of crossing species barriers. The receptor 

8 binding domain (RBD) of the coronavirus spike (S) recognizes host receptors to mediate virus 

9 entry and is therefore a key factor determining the viral tropism and transmission capacity. In 

10 this study, the putative S RBD of BatCoV HKU9 (HKU9-RBD), which is homologous to other 

11 betaCoV RBDs that have been structurally and functionally defined, was characterized via a 

12 series of biophysical and crystallographic methods. By using surface plasmon resonance, we 

13 demonstrated that HKU9-RBD binds to neither the SARS-CoV receptor ACE2 nor the MERS- 

14 CoV receptor CD26. We further solved the atomic structure of HKU9-RBD, which is expectedly 

15 compose of a core and an external subdomain. The core subdomain fold resembles those of other 

16 betaCoV RBDs; whereas the external subdomain is structurally unique with a single helix, 

17 explaining the inability of HKU9-RBD to react with either ACE2 or CD26. Via comparison of 

18 the available RBD structures, we further proposed a homologous inter-subdomain binding mode 

19 in betaCoV RBDs that anchors the external subdomain to the core subdomain. The revealed 

20 RBD features would shed light on the betaCoV evolution route. 

21 

22 
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Introduction 

Coronaviruses are large, enveloped and positive-stranded RNA viruses which can infect birds, 
animals and humans 7, . Taxonomically, these viruses are affiliated to the Coronaviridae family 

1 3 

within the order of Nidovirales ’ . Ever since the 1930s when the first coronavirus of infectious 
bronchitis virus was isolated in chicken 4 , coronaviruses have expanded thus-far into four genera, 

o r x 

Alpha, Beta, Gamma and Deltacoronavirus ’ , respectively. Of these, betacoronaviruses 
(betaCoVs) have drawn worldwide attention because of their pathogenic capacity and 

7 R 

transmission potential to cause a global pandemic of human infections ' and of their wide¬ 
spread and existence of enormous species in bats 6, 9 ~ 1 . In 2002-2003, one representative 
betaCoV, the severe acute respiratory syndrome coronavirus (SARS-CoV) firstly emerged in 

12 15 

China " and then rapidly spread to other countries, leading to >8000 infection cases and >800 
deaths . In 2012, another betaCoV, named the Middle East respiratory syndrome coronavirus 
(MERS-CoV) 7 , was identified first in Saudi Arabia 77, . Despite the global efforts trying to 
control its transmission, MERS-CoV still spreads to affect multi-countries in the Middle East, 
Europe, North America and Asia, causing 1,800 confirmed infections and at least 640 deaths as 

th 8 

of June 23 2016 (based on the latest statistic data released by the World Health Organization ). 
Meanwhile, a human-infective betaCoV of HKU1 was isolated from a patient with respiratory 
disease in Hong Kong 79 . These unexpected outbreaks of betaCoV infection have posed a severe 
threat to the global public health and lead to enormous socioeconomic disruptions. 

Phylogenetically, betaCoVs can be further categorized into four (A, B, C and D) evolutionary 

1 3 

lineages/subgroups ' . SARS-CoV is a typical lineage B member, while MERS-CoV is grouped 
in the C-lineage . Despite belonging to different subgroups, these two betaCoVs likely share 
similar interspecies transmission routes by “jumping” from its natural host/(s) to an intermediate 
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21 

1 adaptive animal/(s) and finally to humans . Current evidence clearly showed that SARS-CoV 

2 originated from bats ’ ’ and possibly adapted in civets or raccoon dogs 4 before it infected 

3 humans. Given the close phylogenetic relationship between MERS-CoV and a variety of bat- 

4 derived coronaviruses (BatCoV) (e.g. HKU4, HKU5 7ft 25 and those recently identified in the 

? 7 

5 Middle East, Africa, Europe and Asia " ), it is widely accepted that the current MERS-epidemic 

6 represents another bat-to-human transmission event related with a betaCoV, though its 

op 0 0 

7 intermediate host is shown, this time, to be dromedaries ' . Notably, two recent studies 

8 reported that BatCoV HKU4 could recognize human CD26, the MERS-CoV receptor 4 , as a 

O C OzT 

9 functional entry receptor ’ , indicating its potential adaptation for human infection. These 

10 continuously occurring yet unpredictable events of betaCoVs repeatedly crossing species barriers 

11 highlight the pressing necessity of studies on other members of the genus for the characters 

21 

12 relevant to the interspecies transmission . 

13 The coronavirus spike (S) protein, which locates on the envelope surface of the virion, functions 

14 to mediate receptor recognition and membrane fusion 7 and is therefore a key factor determining 

15 the virus tropism for a specific species 27, . In most cases, coronaviral S will be further cleaved 

16 into SI and S2 subunits, and the capacity of receptor-binding is allocated to the SI subunit 7 . The 

17 receptor binding domain (RBD) of betaCoV that directly engages the receptor is commonly 

18 located in the C-terminal half of SI (C terminal domain, CTD), such as in SARS-CoV 35 , MERS- 

19 CoV 29,40 and BatCoV HKU4 , though in rare cases such as with mouse hepatitis virus (MHV) , 

20 the RBD region was identified in the SI N-terminal domain (NTD). We previously characterized 

21 structurally the MERS-CoV RBD (MERS-RBD) as a relatively independent entity composed of 

on 

22 a core and an external subdomain . The latter subdomain, which is topologically an insertion 

23 between two scaffold strands of the core subdomain, presents a flat 4-stranded (3-sheet surface to 
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on 

1 contact the CD26 receptor . Similar topological arrangement of the core and external 

2 subdomains into a structural unit for receptor engagement was also observed in SARS-CoV RBD 

3 (SARS-RBD) 35 . Nevertheless, SARS-RBD exhibits a unique loop-dominated external fold to 

4 recognize human angiotensin converting enzyme 2 (ACE2j as a receptor. These observations 

5 indicate that the homologous RBD regions of betaCoVs represent a key determinant in receptor 

21 

6 adaptation and cross-species transmission . 

7 BatCoV HKU9 is a representative betaCoV of the D-lineagc. The virus was firstly identified in 

8 bats in 2007 by next generation sequencing (NGS). Though the isolation of live viruses has 

9 been unsuccessful by far, its genomes are widespread in different bat species 43 . As people are 

10 worrying its interspecies transmission potential, the features of its S protein, especially of the 

11 homologous RBD region (HKU9-RBD), remain unavailable. This would be an indispensable 

12 step towards understanding the pathogenesis of BatCoV HKU9. In addition, the atomic structure 

13 of HKU9-RBD would provide requisite information in understanding the evolution of betaCoVs. 

14 It is notable that MERS-RBD and SARS-RBD share a conserved core structure but differ in the 

21 38 39 

15 external fold to engage different receptors ’ ’ . Sequence features of betaCoV RBDs clearly 

16 indicate that this scheme of subdomain-arrangement might be expanded to the whole 

17 betacoronavirus genus, regardless the species. This notion was supported by our recent study on 

18 BatCoV HKU4 RBD (HKU4-RBD) which exhibits a quite resembled structure to MERS-RBD 35 . 

19 In this study, we reported the structural and functional characterization on HKU9-RBD. The 

20 solved structure expectedly contains a core subdomain homologous to those observed in other 

21 betaCoV RBD structures and an external subdomain that is mainly a-helical. This unique 

22 structural feature explains its inability to react with either human CD26 or ACE2, which is well 

23 observed in our surface plasmon resonance (SPR) assay. Via comparison of available RBD 
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1 structures, we further showed that the detailed interactions, anchoring the external subdomain to 

2 the core subdomain, share similar patterns in betaCoV RBDs. We believe the observed 

3 core/extemal interacting mode represents another structural feature in the S that is reserved 

4 during the evolution of betaCoVs, in addition to the conservation in the fold for the core 

5 subdomain. Our study therefore further supports the notion that betaCoV S originates from the 

6 same ancestor and divergently evolves mainly in the RBD external region to engage variant 

7 receptors, thereby preparing for potential interspecies transmission. 

8 

9 Materials and Methods 

10 Plasmid construction 

11 The plasmids used for protein expression were individually constructed by insertion of the 

12 coding sequences for HKU9-RBD (S residues S355-N521, GenBank accession number: 

13 EF065513), MERS-RBD (S residues E367-Y606, accession number: JX869050), SARS-RBD (S 

14 residues R306-F527, accession number: NC 004718), human CD26 (residues S39-P766, 

15 accession number: NP 001926) and human ACE2 (residues S19-D615, accession number: 

16 BAJ21180) into the EcoRI and Xhol restriction sites of a previously modified pFastBacl vector 7 

17 which was engineered to include an N-terminal gp67 signal peptide coding sequence. For each 

18 protein, an engineered C-terminal hexa-his tag was utilized to facilitate protein purification. To 

19 prepare mouse IgG Fc fragment (mFc) fused proteins, the coding sequences of MERS-RBD, 

20 SARS-RBD and HKU9-RBD were fused with mFc sequence and then constructed into the 

21 pCAGGS vector respectively 35 . 

22 
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1 Protein expression and purification 

2 The proteins used for crystallization and SPR analysis were prepared with the Bac-to-Bac 

AO 

3 baculovirus expression system (Invitrogen) according to the manufacturer’s instructions . In 

4 brief, the verified pFastBacl vector was transformed into the DHlOBac competent cells to 

5 generate the recombinant bacmid. The bacmid was then extracted and transfected into Sf9 cells 

6 to prepare the baculovirus stocks. Sf9 cells were further used to amplify the baculoviruses, while 

7 High5 cells were used to express the protein. 

8 

9 The cell culture of High5 was collected 48 h post infection. Totally 4 L of cell culture of each 

10 protein were collected and centrifuged at 6,500 rpm for 1.5 h to remove cell debris. After filtered 

11 with 0.22 pm membrane, the supernatant was passed through two 5 ml HisTrap HP columns (GE 

12 Healthcare) to capture the individual protein of interest. For MERS-RBD, SARS-RBD, human 

13 CD26 and human ACE2, the bound proteins were detached from HisTrap with 20 mM, 50 

14 mMand 300 mM imidazole in 20 mM Tris-HCl and 150 mM NaCl buffer (pH 8.0). After SDS- 

15 PAGE determination, fractions detached with 300 mM imidazole were pooled and further 

16 purified by a Superdex® 200 column (GE Healthcare). For HKU9-RBD, the bound proteins were 

17 detached from HisTrap with 20 mM, 50 mM and 300 mM imidazole in 20 mM HEPES and 150 

18 mM NaCl buffer (pH 7.0). Fractions detached with 50 mM and 300 mM imidazole were pooled 

19 respectively and dialyzed overnight against 5 L 20 mM HEPES and 150 mM NaCl buffer (pH 

20 7.0) to remove imidazole. The dialysates were concentrated and further purified by a Superdex® 

21 200 column (GE Healthcare). Each protein was stored in the buffer that is used for purification. 

22 
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1 To prepare mFc-fused proteins by mammalian cell expression system, the recombinant pCAGGS 

2 plasmids were confirmed with Sanger sequencing and then prepared with EndoFree Maxi 

3 Plasmid Kit (TIANGEN, Beijing). Each recombinant plasmid was transfected into 293T cells 

4 with 50 pg plasmid DNA per T75 plate using Polyethylimine (PEI, Polysciences Inc.). After 5 h 

5 incubation, the transfected cells were washed with PBS twice and then replaced with DMEM 

6 without serum. The cells were maintained for three days and the supernatant was harvested and 

7 replaced with fresh DMEM medium and then maintained for another four days. The harvested 

8 supernatants were pooled and concentrated and then mixed with two volumes 20 mM Trisodium 

9 phosphate (pH 7.0). The mixture was passed through a 5 ml HiTrap™ Protein A HP prepacked 

10 column (GE Healthcare) to capture the individual protein of interest. After remove of impure 

11 proteins with 20 mM Trisodium phosphate (pH 7.0), the bound protein was detached from the 

12 column with 100 mM glycine (pH 3.0). Each fraction was neutralized with 1 M Tris-HCl (pH 

13 9.0). After SDS-PAGE determination, the detached fractions with interest protein were pooled 

14 and concentrated. The buffer of each protein was then changed to PBS (pH 7.0) for further 

15 experiments. 

16 

17 SPR assay 

18 The BiAcore® experiments were carried out at 25 °C using a BiAcore® 3000 or BIAcore® T100 

19 machines with CM5 chips (GE Healthcare). For all the measurements, an HBS-EP buffer 

20 consisting of 10 mM HEPES, pH7.4, 150 mM NaCl and 0.005% v/v Tween® 20 was used, and 

21 all proteins were exchanged into this buffer in advance. Firstly, the HKU9-RBD, MERS-RBD 

22 and SARS-RBD proteins expressed by insect cell were used for SPR assay using a BiAcore® 
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1 3000 machine. BSA (negative control), HKU9-RBD, MERS-RBD and SARS-RBD proteins 

2 were immobilized on the chip at about 1000 response units (RU), according to manufacturer’s 

3 amine-coupling chemistry protocol (GE Healthcare). Gradient concentrations of human CD26 (0, 

4 19.5 - 5000 nM) or human ACE2 (0, 39 - 625 nM) were then flowed over the chip surface. After 

5 each cycle, the sensor surface was regenerated via a short treatment with 10 mM NaOH. The 

6 equilibrium dissociation constants (binding affinity, Kd values) were analyzed using BIA 

7 evaluation® (BIAcore® software). In order to exclude the possibility that HKU9-RBD could be 

8 non-functional because of immobilization or missing some important post-translational 

9 modifications on the protein, we purified the mFc-fused HKU9-RBD proteins in mammalian 

10 cells and assembled the binding abilities to CD26 or ACE2 proteins using a captured SPR 

11 method by a BIAcore® T100 system. The CM5 chip was immobilized with anti-mouse antibody 

12 for flow cell (FC) 1 and 2. The mFc-fused RBD proteins were then injected and captured on the 

13 FC 2, while FC 1 was used as negative control. Human CD26 or human ACE2 proteins were 

14 theninjected and the binding responses were measured. The immobilized anti-mouse antibody 

15 was regenerated with 10 mM Glycine, pH 1.7 (GE Healthcare).. 

16 

17 Crystallization 

18 The crystallization trials were performed with 1 pL protein mixing with 1 pL reservoir solution 

19 and then equilibrating against 100 pL reservoir solution at 4°C by the vapor-diffusion sitting- 

20 drop method. The initial crystallization was screened using the commercially-available kits. 

21 Diffractive crystals of the HKU9-RBD protein were finally obtained in a condition consisting of 

22 0.1 M sodium citrate tribasic dihydrate pH 7.0 and 12% PEG 20,000 with a protein concentration 
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1 of 2.2 mg/mL. Derivative crystals were obtained by socking the crystals in the reservoir solution 

2 containing ImM KAuBr^FbO for 48 h at 4 °C. 

3 

4 Data collection, integration and structure determination 

5 For data collection, all crystals were flash-cooled in liquid nitrogen after a brief soaking in 

6 reservoir solution with the addition of 20% v/v glycerol. The diffraction data for the native 

7 (wavelength: 1.03906 A) and Au derivative crystals (wavelength: 1.03906 A) of HKU9-RBD 

8 were collected at Shanghai Synchrotron Radiation Facility (SSRF) BL17U. All data were 

9 processed with HKL2000 .The ice-rings that form in the crystal flash cooling process were 

10 excluded in data processing and the final overall completeness for the data set is 97.1%. 

11 

12 The structure of HKU9-RBD was determined by SAD method. After location of Au sites by 

13 SHELXD 50 with the Au-SAD data, the identified positions were then refined and the phases 

14 were calculated with SAD experimental phasing module of PHASER 57 . The real space 

15 constraints were further applied to the electron density map in DM . The initial modelwas built 

16 with Autobuild in the PHENIX package . Additional missing residues were added manually in 

17 COOT 54 . The final model was refined with phenix. refinein PHENIX 5 ’ with energy 

18 minimization, isotropic ADP refinement, and bulk solvent modeling. The stereochemical 

19 qualities of the final model were assessed with MolProbity . The Ramachandran plot 

20 distribution for the residues in the HKU9-RBD structure were 94.64, 5.36 and 0% for favored, 

21 allowed and outliers regions, respectively. Data collection and refinement statistics are 
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1 summarized in Table 1. All structural Figures were generated using PyMol 

2 (http://www.pymol.org) . 

3 

4 Results 

5 HKU9-RBD does not bind either the SARS-CoV or the MERS-CoV receptor 

6 We first characterized the sequence of BatCoV HKU9 S by using a series of bioinformatic 

7 methods. This 1274-residue protein exhibits typical features of coronavirus S proteins (e.g. the 

8 presence of characteristic heptad repeats 1 and 2 in the S2 subunit), though the S1/S2 cleavage 

9 site potentially processed by furin-like proteases was not detected (Fig. 1A). Along the full- 

10 length protein, the amino acid sequence identity between BatCoV HKU9 S and other betaCoVSs 

11 is rather limited (e.g. 27.9% to MERS-CoV S, 28.0% to HKU4 S and 30.4% to SARS-CoV S). 

12 Nevertheless, we were able to identify the RBD region based on the characteristic cysteine 

13 residues of the core subdomain (Fig. IB), which were shown, in the thus-far available RBD 

DC DO DQ CzT 

14 structures ’ ’ ’ , to form three conserved disulphide bonds stabilizing the core fold. The 

15 subsequent HKU9-RBD was allocated to the S region spanning residues 355-521 (Fig. 1A). In 

16 comparison to other RBD sequences, HKU9-RBD exhibits a comparable length in the core 

17 subdomain (Fig. IB) but is dramatically shortened in the external region (Fig. 1C). 

18 To test if HKU9-RBD could react with either the SARS-CoV receptor ACE2 ¥2 or the MERS- 

19 CoV receptor CD26 , the RBD and the receptor-ectodomain proteins were individually prepared 

20 in insect cells and purified to homogeneity. The ligand/receptor interaction was then 

21 characterized via SPR Biacore® by flowing through ACE2 or CD26 over the immobilized RBD 

22 proteins. As expected, potent interactions were observed for both the SARS-RBD/ACE2 (Kd 
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1 value: 0.265 pM) (Fig. 2A) and the MERS-RBD/CD26 (Kd value: 52.8 nM) (Fig. 2B) binding 

k on 

2 pairs. The revealed kinetics were very similar to those reported in the previous studies ’ , 

3 validating the integrity of our testing system. Under the same condition, however, neither ACE2 

4 (Fig. 2C) nor CD26 (Fig. 2D) interacted with HKU9-RBD. In order to exclude the possibility 

5 that HKU9-RBD could be non-functional because of immobilization or missing some important 

6 post-translational modifications on the protein, we purified the mFc-fused F1KU9-RBD proteins 

7 in mammalian (293T) cells and assembled the binding abilities to CD26 or ACE2 proteins using 

8 a captured SPR method. In the same way, there was no detectable binding of mFc-fused HKU9- 

9 RBD to ACE2 or CD26 (Fig. 1G), while the mFc-fused SARS-RBD protein bound to ACE2 

10 (Fig. IE) and the mFc-fused MERS-RBD bound to CD26 (Fig. IF) well. BatCoV HKU9, 

11 therefore, could utilize neither the SARS-CoV receptor nor the MERS-CoV receptor for cell 

12 entry. Rather, it must utilize a unique cellular receptor for entry. 

13 

14 Crystal structure of HKU9-RBD 

15 We further set out to investigate the structural features of F1KU9-RBD via crystallography. The 

16 protein was successfully crystallized, and a dataset of 2.1 angstrom (A) was collected (Table 1) 

17 and the structure was solved by using single-wavelength anomalous diffraction (SAD) method. 

18 The solved structure, with an R WO rk = 0.1700 and an Rf ree = 0.2006, contains a single molecule in 

19 the crystallographic asymmetric unit. Clear electron densities were traced for 176 consecutive 

20 HKU9-RBD residues, extending from S355 to A520. These amino acids fold into a compact 

21 structure, which can be further divided into two subdomains as schemed in the other RBD 

oc oo on 

22 structures ' ’ . The core subdomain comprises 8 |3-strands and 6 helices (a or 3 io). Five long 
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1 strands (Pcl-Pc5) were arranged in an anti-parallel manner, forming the scaffold center of the 

2 core (core-center). This core-center sheet was further wrapped by the surface helices and loops. 

3 It is notable that the 6 helices (H1-H6) were sporadically distributed on the two sheet faces, 

4 thereby leading to an overall globular fold for the core subdomain. On one lateral side of the 

5 core-center sheet, the external subdomain covers the core like a hat; while on the distal opposite 

6 side, three small strands (pp 1 -pp3) constitute a parallel peripheral sheet (core-peripheral), 

7 clinching the N- and C-termini of HKU9-RBD in the close proximity. As expected, the 

8 characteristic cysteine residues (Fig. IB) form three disulphide bonds in the core subdomain, 

9 further stabilizing the core structure from the interior. Of these, two (C357/C381 and 

10 C411/C517) locate in core-peripheral, contributing to the RBD-termini orientation; one 

11 (C399/C452) resides in core-center, linking strands Pc2 and Pc4 (Fig. 3). Overall, the residue 

12 boundaries of the core subdomain observed in the structure are quite consistent with those 

13 deduced from the sequence alignment result (Fig. IB). 

14 The external subdomain of HKU9-RBD consists of 42 residues from L458 to V499 (Fig. 1C). 

15 These amino acids extend out of strand Pc4 of the core-center sheet, first orient as a loop along 

16 the core subdomain like a clamp, then fold back to form a solvent exposed a-helix (HI’) and 

17 finally proceed into the core strand Pc5 (Fig. 3). This observed structure differs dramatically 

18 from those of SARS-RBD and MERS-RBD, which are shown to be devoid of any helical 

30 3Q 

19 components in the external region ’ . The unique external fold of HKU9-RBD could well 

20 explain its inability to bind either ACE2 or CD26. 

21 

22 Structural conservation of the RED core subdomain in betaCoVs 
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Previously, three betaCoV RBD structures have been reported, including one lineage B structure 
(SARS-RBD 55 ) and two lineage C structures (MERS-RBD 59 and HKU4-RBD’ 5 ). These 

on 

structures indicated an interspecies conservation in the core fold among betaCoVs . BatCoV 
HKU9 is a representative member of the betaCoV D-lineage . We therefore compared the 
currently available RBD structures with the HKU9-RBD structure solved in this study. As 
expected, a significant similarity was observed in the core subdomain (Figs. 4A-D). 
Superimposition of the core-structures revealed the root-mean-square deviation (rmsd) values 
ranging from 0.66 to 2.82 A (Table 2), demonstrating the quite resembled core fold (though low 
sequence identity) among the four RBDs. The most conserved part was seen in the core-center 
sheet. This 5-stranded scaffold element as well as the single inter-strand disulphide bond were 
invariably reserved in all the structures. In the core-peripheral region, however, small variance in 
the strand composition was noted. In HKU9-RBD, it contains three short P-strands, arranging 
into a small parallel P-sheet. Both SARS-RBD and MERS-RBD retained two of these strands, 
whereas HKU4-RBD is devoid of any detectable strand elements in this region. Despite the 
observed difference in strand formula, the core-peripherals of the RBDs exhibit similar 
orientation and present the same scheme of arranging the domain N-terminus into close 
proximity to its C-terminus. Extra common features in core-peripheral lie in the two disulphide 
bonds in the region, which were structurally and topologically conserved in the four structures 
(Figs. 4A-D). 

In contrast to the core-conservation, the external subdomains of the four RBDs were divergent in 
structures. HKU9-RBD presents a single HE helix in the external region, whereas SARS-RBD is 
loop-dominated, but contains two extra small P-strands. The external subdomains of MERS- 
RBD and HKU4-RBD, however, resemble each other and are predominantly a rigid P-sheet 
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1 composed of four (3-strands. Despite the structural irrelevance, the external subdomains are 

2 clearly topological equivalents in these structures, being present as an insertion between two 

3 core-center strands (Figs. 4A-D). 

4 

5 Homologous interaction mode anchoring the external subdomain to the core subdomain 

6 By superimposition of the available RBD structures, we unexpectedly identified two major 

7 elements in the external subdomain that could be well-aligned (Fig. 5A). The first element 

8 (element 1) spans about 7 residues (Y464-F470 in HKU9-RBD, Y438-R444 in SARS-RBD, 

9 Y497-C503 in MERS-RBD and Y501-C507 in HKU4-RBD) (Figs. 1C&5C) and proceeds along 

10 the core subdomain surface to be lodged between the H2 and H6 helices (based on the secondary 

11 element definition of HKU9-RBD) (Fig. 5A). The second element (element 2) contains 8 amino 

12 acids (P471-Q478 in HKU9-RBD, K447-D454 in SARS-RBD, L517-S524 in MERS-RBD and 

13 Y522-S529 in HKU4-RBD) (Figs. 1C&5C), extending as a curved loop covering helix H6 of the 

14 core subdomain (Fig. 5A). It is interesting that these two elements were “saddled” upon the core- 

15 helices, anchoring the external subdomain to the core subdomain (Fig. 5A). We therefore further 

16 explored the amino acid interaction details at this core/extemal interface. 

17 Each element residue was scrutinized for both the side-chain orientation and the inter-subdomain 

18 interactions. To facilitate the analyses and comparison, the two elements were assigned a 

19 position marker for each amino acid (a-g for element 1 and a-h for element 2) (Figs. 5B&5C). In 

20 element 1, the a-residue is invariably a tyrosine in the four RBDs. This amino acid oriented its 

21 bulky side-chain towards the core subdomain, providing strong hydrophobic contacts. An extra 

22 side-chain Fl-bond was also observed at this position in HKU9-RBD. The b-residue extended 
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1 away from the core surface and exhibited little conservation. The residue, however, invariably 

2 contributed to the subdomain anchoring by providing a main-chain H-bond. Following b, the 

3 amino acids were preferably facing towards the core at position c and spreading parallel to the 

4 core surface at position d. Multiple van der Waals (vdw) contacts and conserved main-chain H- 

5 bonds were observed at these two positions, respectively. Small discrepancy was seen in SARS- 

6 RBD, which orients its c-residue outwards for the bulky solvent region. The remnant three 

7 element 1 amino acids at positions e, f and g were distant from the core subdomain and 

8 completely solvent exposed, therefore contributing little to the core/extemal interactions (Figs. 

9 5B&5C). 

10 In element 2, both the a- and d-residue were oriented parallel to the surface of the core 

11 subdomain. The configuration allows the amino acids to provide apolar vdw contacts to 

12 strengthen the core/extemal subdomain-binding. A certain extent of amino acid conservation was 

13 observed at position d where a proline is favored to facilitate the turning of the loop. Following 

14 these two positions, the b- and e-residue inserted their side-chains into two surface pockets of the 

15 core subdomain. At position b, the residue is conservatively of hydrophobic and middle-sized 

16 side-chain (Val/Ile/Leu). It is accommodated in a shallow apolar pocket, creating strong stacking 

17 forces bonding the core and external subdomains. In addition, the residue also contributes a 

18 main-chain H-bond to the subdomain binding. For the e-residue, its accommodating pocket is 

19 deep and of large size, therefore allowing for amino acid variance at the position (Gly in HKU9- 

20 and HKU4-RBD, Phe in SARS-RBD and Asn in MERS-RBD). In the four RBDs, this e-residue 

21 invariably H-bonds with the core subdomain residue via the main-chain atom, but may optionally 

22 provide the side-chain H-bond interactions (e.g. in MERS-RBD) or the multi-vdw contacts (e.g. 

23 in SARS-RBD). Extra core/extemal interactions in this region were further observed at position 
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1 g, where the residue is oriented either in parallel to or towards the core subdomain and thereby 

2 contributes to the binding via hydrophobic and side-chain H-bond interactions. It is also of 

3 interest that these important interface residues of element 2 are regularly interspersed by amino 

4 acids at positions c, f and h, which are solvent exposed and rarely interact with the core 

5 subdomain (Figs. 5B&5C). 

6 In summary, the four coronaviral RBD structures presented homologous amino acid interaction 

7 patterns for the inter-subdomain binding. The binding relies mainly on two elements in the 

8 external subdomain, which are oriented similarly in these structures (Fig. 5A). Despite the less 

9 conservation in the element sequences, the side-chain orientation and the interaction modes 

10 (hydrophobic, vdw or H-bond contacts) at each position are, in most cases, similar or 

11 homologous (Fig. 5C). 

12 

13 Discussion 

14 Bats have been implicated harboring the largest natural genetic pools for new coronarivuses or 

C H 

15 coronaviral genes. A majority of the betaCoVs could be traced, for the origin, back to bats , e.g. 

16 a recent study isolated, in Chinese horseshoe bats, a live SARS-like coronavirus that can utilize 

17 the SARS-CoV receptor of ACE2 for cell entry , thereby providing the strongest evidence for 

18 the bat-origin of this pandemic human pathogen. In addition, two studies reported the 

9 A 9 O 

19 identification of gene fragments in bats that are almost identical to those of MERS-CoV ’ , 

20 indicating that MERS-CoV likely also originates from bats. Noted the recent reports showing the 

21 adaptation of batCoV HKU4 for binding to human cells via recognizing CD26 , it is an urgent 

22 issue to prepare for the unforseeable events of potential interspecies transmission by other bat- 
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1 derived betaCoVs. BatCoV HKU9 is an important lineage D betaCoV 77 and has been 

2 demonstrated to be widespread and be circulating in different bat species 45 . Noted the 

3 determinative role of the coronaviral S RBD in the process of crossing species barriers (as has 

O C OO OQ 

4 been structurally illustrated in other coronaviruses ’ ' ), we characterized the homologous 

5 RBD protein of batCoV HKU9 S for the structural and functional features. The solved structure 

6 revealed a core subdomain that resembles those observed in SARS-, MERS- and HKU4-RBD 

7 but a unique external subdomain that is composed of a single helix. Since the RBD external 

8 subdomain contains the key motifs (denoted receptor binding motif, RBM), interfacing with the 

o c oo on 

9 receptor ’ ' , the unique external fold of HKU9-RBD therefore well accords with our 

10 functional data showing its inability to react with either ACE2 or CD26. It remains to be 

11 investigated which host molecule could be recognized by HKU9-RBD as a functional cell entry 

12 receptor. Nevertheless, taking into account the single helical component in the external 

13 subdomain of HKU9-RBD, the RBM is expected to locate on the solvent-access side of the 

14 helix, which might facilitate future attempts for receptor identification. 

15 It should be noted that coronavirus RBDs are not necessarily locating in the C-terminal half of 

16 the S1 subunit. Previous structural and mutagenesis data showed that the RBD of MHV S locates 

17 in the SI N-terminal half (NTD) 47, . Nevertheless, the current data seem to favor a notion that 

18 CTD prioritizes over NTD to function as the receptor binding entity, as the majority of the 

19 coronaviruses (e.g. SARS-CoV 55 , MERS-CoV 59 , batCoV HKU4 55 , human coronavirus NL63 59 

20 and transmissible gastroenteritis virus” 9 etc) harbor a CTD as RBD. It is interesting that the 

oc oo on 

21 current available betaCoV CTD/RBD structures ' ’ all clinch the N- and C-terminus on the 

22 same side opposite to where the external subdomain locates. This arrangement mode would lead 

23 the SI N-terminal half to being sterically underneath the C-terminal half, thereby projecting the 
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1 CTD distant from the viral envelope for a trans-interaction with the receptors. Our structural 

2 study demonstrated that HKU9-RBD reserved the same character and therefore stands a better 

3 chance of being the authentic receptor binding entity. 

4 In comparison to SARS-, MERS-, and HKU4-RBD whose structures are available 35, 38 ' 39 , 

5 HKU9-RBD differs in the external fold but reserves a resembled core subdomain structure. The 

6 most conserved part lies in the core-center sheet which is composed of five anti-parallel strands 

7 and functions as the scaffold of the core subdomain. The sheet is sterically and structurally 

8 conserved in all the RBD structures. Additional conserved elements include the core-center 

9 helices and core-peripheral. Nevertheless, these elements could vary in the secondary-element 

10 compositions. E.g. A recent study on the structure of HKU4-RBD showed that its N-terminal- 

oo 

11 most part does not fold into a characteristic helix as observed in the structures of SARS- and 

on 

12 MERS-RBD . For core-peripheral, the amount of strands was found to be varying from zero (as 

13 in HKU4-RBD) to three (as in HKU9-RBD) (Fig. 4). This has added dramatic complexities to 

14 the nomenclature of the RBD secondary elements. The situation would be even worse when 

15 taking into account of the external subdomains which could vary significantly in structure. We 

16 therefore suggest to designate the core-center strands and helices as (3cs (Pcl-Pc5) and Hs (HI, 

17 H2...etc), respectively, and to refer to the core-peripheral strands as Pps (pp 1, Pp2...etc) and the 

18 external elements as H’s (HE, H2’...etc) or P’s (pi\ P2’...etc). This terminological strategy 

19 should be able to facilitate the comparison of homologous RBD structures and to reflect the fact 

20 that the external subdomain is topologically an insertion between two equivalent core-center 

21 strands. 

22 The long evolutionary history, high mutation rates and many genetic artifices of RNA viruses 

23 often lead to conundrums in the virus origin study. It would be even more difficult to track the 
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evolutionary traces in the viral surface proteins which are normally under great evolutionary 
pressure. The evolutionary records, however, are more likely to be conserved in the tertiary 
structures than in the amino acid sequences. In betaCoVs, the inter-lineage sequence identity in 
the S RBD is rather limited. Nevertheless, we observed several conserved features in the 
betaCoV RBD structures. These include: 1) a conserved core-center as the scaffold of the core 
subdomain; 2) a resembled core-peripheral where the RBD termini are clinched in proximity; 3) 
a similar topological arrangement of the external subdomain as an insertion between two core¬ 
strands; and 4) a homologous inter-subdomain binding mode anchoring the external subdomain 
to the core subdomain. The features indicate a common ancestor S protein which divergently 
evolves into different species. During evolution, the core subdomain is structurally reserved, 
whereas the external subdomain folds into variant structures to engage different receptors. It is 
also noteworthy that the aforementioned core features have been structurally validated in 
betaCoV lineages B (SARS-RBD), C (MERS- and HKU4-RBD) and D (HKU9-RBD) but not 
yet in lineage A. Structural studies on the equivalent S RBDs of the lineage A members should 
be conducted in the future. 


FIGURES 

Figure 1. Sequence features of HKU9-RBD. (A) A schematic representation of the BatCoV 
HKU9 S. The indicated domain elements were defined based either on the pairwise sequence 
alignment results or on the bioinformatics predictions. The signal peptide (SP), transmembrane 
domain (TM) and heptad repeats 1 and 2 (HR1 and HR2) were predicted with SignalP 4.0 server, 
TMHMM server and Leamcoil-VMF program, respectively; while the N-terminal domain 
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(NTD) and RBD were deduced by aligning with the N-terminal galectin-like domain of murine 
hepatitis virus S and with MERS-RBD, respectively. The S1/S2 site potentially cleaved by furin¬ 
like proteases could not be ascertained and is therefore labeled with a question mark. (B, C) 
Structure-based alignment of the HKU9-, SARS-, MERS- and HKU4-RBD sequences. The 
arrows and spiral lines indicate strands and helices, respectively. These secondary structure 
elements were labeled as illustrated in Fig. 3. The conserved cysteine residues that form three 
disulphide bonds in the structures are marked with Arabic numbers 1-3. The core subdomain is 
conserved among the four RBD structures; but the external subdomain is structurally irrelevant. 
We therefore presented the sequences separately. The two elements that anchor the external 
subdomain to the core subdomain were highlighted with the black boxes. (B) Core subdomain 
sequence. (C) External subdomain sequence. 


Figure 2. Characterization of HKU9-RBD by SPR assays. The indicated RBD proteins 
expressed by insect cells were immobilized on CM5 chips and tested for the binding with 
gradient concentrations of human ACE2 or CD26 using a BIAcore® 3000 machine. The recorded 
kinetic profiles are shown. (A) Human ACE2 to SARS-RBD. (B) Human CD26 to MERS-RBD. 
(C) Human ACE2 to HKU9-RBD. (D) Human CD26 to HKU9-RBD. Clearly shown is that 
HKU9-RBD does not bind either ACE2 or CD26, in the context of SARS-RBD and MERS-RBD 
bind their respective receptors. Then we purified the mFc-fused HKU9-RBD proteins in 
mammalian (293T) cells and assembled the binding abilities to CD26 or ACE2 proteins using a 
captured SPR method by a BIAcore® T100 system. The anti-mouse antibodies were immobilized 
on CM5 chips. The mFc fused RBD proteins were then captured (3 pg/mL for 60 seconds) by the 
antibodies, and tested for the binding with human ACE2 or CD26. (E) The mFc fused SARS- 
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RBD (SARS-RBD-mFc) did not bind to CD26, while bound to ACE2 well. (F) The mFc fused 
MERS-RBD (MERS-RBD-mFc) did not bind to ACE2, while bound to CD26 well. (G) The mFc 
fused HKU9-RBD (HKU9-RBD-mFc) does not bind either ACE2 or CD26. 


Figure 3. Crystal structure of HKU9-RBD. The core and external subdomains are colored 
magenta and green, respectively. The core subdomain is further divided into a center region 
(core-center) and a peripheral region (core-peripheral), which are encircled for highlight. The 
core-center strands and helices are labeled Pcl-Pc5 and F11-FT6, respectively; while the core¬ 
peripheral strands are marked Ppl-Pp3. The disulphide bonds and the RBD termini are labeled. 
The core subdomain is further presented in a surface representation in the right panel to highlight 
the top positioning of the external subdomain like a hat. 


Figure 4. Structural and topological comparison of thus-far available betaCoV RBD 
structures. Four structures, including those of HKU9-, SARS-, MERS- and HKU4-RBD, were 
oriented similarly and presented as cartoons in parallel. The core-center, core-peripheral and the 
external subdomain are highlighted by encircling in yellow. For each structure, the topological 
arrangement of the core-center and core-peripheral strands as well as of the external components 
is depicted. The core-strands that fla nk the external subdomain are colored red and blue, 
respectively. Yellow lines indicate disulphide bonds. The N- and C-terminus are highlighted. (A) 
HKU9-RBD. (B) SARS-RBD. (C) MERS-RBD. (D) HKU4-RBD. The similarity in the 
topological arrangement of the external subdomain as an insertion between two core-strands was 
illustrated. 
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1 

2 Figure 5. Homologous inter-subdomain amino acid interactions anchoring the external 

3 subdomain to the core subdomain. (A) Superimposition of the betaCoV RBD (HKU9-RBD in 

4 green, SARS-RBD in yellow, MERS-RBD in blue and HKU4-RBD in cyan) structures 

5 highlighting the external elements that can be well-aligned. These two elements, with 7 (element 

6 1) and 8 (element 2) amino acids, respectively, engage mainly the core subdomain H2 and H6 

7 helices for the inter-subdomain interactions. To facilitate comparison, the element residues were 

8 successively assigned a position marker (a-g for element 1 and a-h for element 2), which is 

9 highlighted. (B) Characterization of the element residues for their contributions to the inter- 

10 subdomain binding. The two external elements were presented as cartoons, while the core 

11 subdomain is shown in surface. At each position, the residue is marked sequentially with the 

12 position marker, the amino acid identity and numbering, the interacting mode/type and the side- 

13 chain orientation. For the interaction mode, the hydrophobic or van der Waals interactions are 

14 indicated with encircled Ps, the side-chain H-bonds with encircled Ss and main-chain H-bonds 

15 with encircled Ms. The side-chain orientations are indicated with arrows. (C) Summarization of 

16 the inter-subdomain interactions specified in (B). The element sequences of the four RBDs were 

17 aligned and listed. “+” indicates that a certain type of interaction is commonly observed at the 

18 position, while “+/-” indicates that the interaction type is specific to some but not all of the four 

19 RBDs. The arrows mark the side-chain orientations. 
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7 

2 Table 1. Data collection and refinement statistics. 


8 




9 

10 

Data collection 

HKU9-RBD 
(PDB code 5GYQ) 

Au derivative HKU9-RBD 

11 

12 

Space group 

P21 

pi 

13 

14 

Wavelength 

1.03906 

1.03906 

15 

16 

Unit cell dimensions 



17 

18 

a, b, c (A) 

42.7, 36.0, 62.9 

36.0, 46.6, 57.3 

19 

20 

«. P, y (°) 

90.0, 102.7, 90.0 

80.4, 88.8, 88.5 

21 

Resolution 0 (A) 



22 

50.00-2.10(2.18-2.10) 

50.00-2.48 (2.57-2.48) 

23 




24 

Observed reflections 

101,588 

52,401 

25 




26 

27 

Completeness (%) 

97.1 (80.7) 

97.7 (96.9) 

28 

29 

Redundancy 

9.4 (9.4) 

4.1 (3.7) 

30 

31 

^merg/(%) 

6.1 (15.4) 

9.2 (39.0) 

32 

33 

I/al 

34.023 (12.057) 

16.960(4.637) 

34 

35 

cc 1/2 

0.998 (0.988) 

0.986 (0.915) 

36 




37 

Refinement 



38 




39 

Resolution (A) 

41.7-2.10 


40 




41 

Number of reflections 

10,811 


42 



43 

44 

Completeness for range (%) 

97.0 


45 

46 

^work/^fr ee 

0.1700/0.2006 


47 

48 

No. atoms 



49 

50 

Protein 

1367 


51 




52 

Water 

128 


53 




54 

B-factors 



55 




56 

Protein 

28.7 



57 

58 

59 

60 
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26 
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28 

29 

30 

31 

32 

33 

34 
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41 

42 
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Water 

34.1 

R.m.s. deviations 


Bond length (A) 

0.003 

Bond angles (°) 

0.820 

Ramachandranplot d 


Favored (%) 

94.64 

Allowed (%) 

5.36 

Outliers (%) 

0.00 


1 a Values for the outmost resolution shell are given in parentheses. 


Rmerge = 2i£hki I h — <I> | /SjXhkilj, where I; is the observed intensity and <I> is the average 


3 intensity from multiple measurements. 


4 c R 


work 


= S | | F 0 1 — | F c | | /E | F 0 1 , where F 0 and F c are the structure-factor amplitudes from the 


5 data and the model, respectively. Rf ree is the R factor for a subset (5%) of reflections that was 

6 selected prior to refinement calculations and was not included in the refinement. 


7 d Ramachandran plots were generated by using the program MolProbity. 
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1 Table 2. Statistics of the core subdomain deviations among thus-far available betaCoV 

2 RBD structures. 

3 



HKU9-RBD 

MERS-RBD 

SARS-RBD 

HKU4-RBD 

HKU9-RBD 


2.07 A (104 
Ca) 

1.92 A (100 
Ca) 

1.37 A (94 Ca) 

MERS-RBD 



2.82 A (75 Ca) 

0.66 A (109 
Ca) 

SARS-RBD 




1.95 A (82 Ca) 

HKU4-RBD 






4 

5 The RBD core subdomain structures were superimposed onto each other by PyMol in a pairwise 

6 manner to calculate the root-mean-square deviation (rmsd) values, which were listed in the table. 

7 The values in parentheses indicate the number of equivalent Ca atoms that were selected for 

8 rmsd calculations. 
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Figure 1. Sequence features of HKU9-RBD 
180x127mm (300 x 300 DPI) 


ACS Paragon Plus Environment 



































































































































1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 

33 

34 

35 

36 

37 

38 

39 

40 

41 

42 

43 

44 

45 

46 

47 

48 

49 

50 

51 

52 

53 

54 

55 

56 

57 

58 

59 

60 


Response Units (RU) Response Units (RU) Response Units (RU) 


Biochemistry 


B 



100 


CD26 to MERS-RBD 



-OpM 

- 0.0195 pM 

0.0195 pM 
0.039 pM 

- 0.078 pM 

- 0.156 pM 

0.3125 pM 
0.625 pM 

- 1.25 pM 

- 2.5 pM 

- 5 pM 


Time (s) 


150 

Time (s) 


300 

250 

200 

150 

100 

50 

0 


ACE2 to HKU9-RBD 


•0 pM 
•0.039 pM 
0.039 pM 
0.078 pM 
0.156 pM 
0.3125 pM 
0.625 pM 
1.25 pM 
■2.5 pM 
•5 pM 


50 


100 




150 

Time (s) 


200 


250 




Time(s) 



400n 



t t 

5 pM ACE2 5 pM CD26 


*—3 pg/mL HKU9-RBD-mFc 


-100 0 


100 200 300 400 500 600 

Time(s) 


Figure 2. Characterization of HKU9-RBD by SPR assays. 
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Figure 3. Crystal structure of HKU9-RBD 
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Figure 4. Structural and topological comparison of thus-far available betaCoV RBD structures 
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Figure 5. Homologous inter-subdomain amino acid interactions anchoring the external subdomain to the 

core subdomain 
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